Scaling Textual Inference to the Web

نویسندگان

  • Stefan Schoenmackers
  • Oren Etzioni
  • Daniel S. Weld
چکیده

Most Web-based Q/A systems work by finding pages that contain an explicit answer to a question. These systems are helpless if the answer has to be inferred from multiple sentences, possibly on different pages. To solve this problem, we introduce the HOLMES system, which utilizes textual inference (TI) over tuples extracted from text. Whereas previous work on TI (e.g., the literature on textual entailment) has been applied to paragraph-sized texts, HOLMES utilizes knowledge-based model construction to scale TI to a corpus of 117 million Web pages. Given only a few minutes, HOLMES doubles recall for example queries in three disparate domains (geography, business, and nutrition). Importantly, HOLMES’s runtime is linear in the size of its input corpus due to a surprising property of many textual relations in the Web corpus—they are “approximately” functional in a well-defined sense.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

E-learning Content Management an Ontology-based Approach

Scarcity of E-Learning content being a barrier for ELearning is no longer true on today’s Internet. The current concerns are how to effectively annotate and organize available content (both textual and non-textual) to facilitate effective sharing, reusability and customization in an intelligent fashion. In this paper, we explain a component-oriented approach to organize content in an ontology. ...

متن کامل

A Discourse Commitment-Based Framework for Recognizing Textual Entailment

In this paper, we introduce a new framework for recognizing textual entailment which depends on extraction of the set of publiclyheld beliefs – known as discourse commitments – that can be ascribed to the author of a text or a hypothesis. Once a set of commitments have been extracted from a t-h pair, the task of recognizing textual entailment is reduced to the identification of the commitments ...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

طراحی سیستم خبره به منظور تشخیص حمله‌های فیشینگ در بانکداری الکترونیکی

In e-commerce and e-banking environments, one of the most risks or challenges which must be considered, is the risk of online fraud specially phishing attacks. In this study, we use some visual and technical identifies of a phishing web site as parameters to implement an expert system to diagnose this type of attack in electronic banking. In the proposed system, we use 27 different features as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008